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The Variety problem in Big 
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Variety = Choice 


Choice = Good 


(Right?) 


The Earth Observing System Data and 
Information System (EOSDIS) 
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The Variety problem in Big Earth Data 
from Satellites 


Distinct Science Products Distributed 
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Earthdata Search 
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Take a Tour 


DB) [SYore)\/=) an mts adameeie)(>)aler=m By -t | 


Search NASA Earth Science data by keyword and filter by © time or 4] space. 


Q Browse AllData © See featured collections or use categories to narrow your results. 


Too many datasets to sift manually 
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© Browse Collections 
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Add collections to your project to compare and retrieve 


their data. ® Learn More 


@ Report a metadata problem 


Recent and Featured 


BUV/Nimbus-4 Ozone (O03) Profile and Total Column 

Ozone 1 Month Zonal Mean L3 Global 5.0 degree 

Latitude Zones V1 (BUVNO4L3zm) at GES DISC 
BUVNOdL 32m vl - WASASSFC/SEDESD GCDOCIGESDISC 


1970-04-10 to 1976-05-01 | 1 Granule , 
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Where does Variety come from? 


Instruments 
Fundamental differences: sounders, limb sounders, imagers... 
Incremental evolution in instrument design 

Satellites 
“Same” instrument on different satellites 

Processing Level 
Calibrated -> Swath -> Grid -> Model 

Processing Algorithm 
Different basic principles 
Incremental evolution in algorithm development 

Temporal Resolution 
daily, 5-day, 8-day, monthly, yearly 

Spatial Resolution... 


Example: Time Aggregation 


Time Averaged Map of Aerosol Optical Depth 555 nm daily 0.5 deg. [MISR MIL3DAE v4] 
over 2009-09-21, Region 180W, 90S, 180E, 90N 
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Time Averaged Map of Aerosol Optical Depth nm monthly 0.5 deg. [MISR MILSMAE v4 
over 2009-Sep, Region 180W, 90S, 180E, S90N 


bo 
" Ja 
eS. ria + 
_ ¥ 
prRussitt.. , --k se 
ks. * 2 ~ ‘ . . Fs ' 


(See 


0.857 


“i . 
Bolis ity 


0.286 


0.143 


Antarcticn 


135 45 90 


- Selected date range was 2009-09-21 - 2009-09-21. Title reflects the date range of the granules that went into making this result. 


Aerosol Optical Depth at 555 
nm from Multi-angle Imaging 
Spectro-Radiometer 


Daily 


What to do? 


Emulate the best search engines: return the 
most relevant results at the top of the list 


A la Wikipedia 


“how well a retrieved 
document or set of 
documents meets the 
information need of the 
user” 
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Relevancy Ranking Heuristics 


Heuristic = “rule of thumb: 
Basis is 20+ years of serving satellite data 
to researchers 


© EOSDIS 1 


The Content Heuristic” 
Got ozone? 


i Li Li 1 
, Datasets | Catalogs | Bookmarks | 


* ColumnAmount0O3 


lat lat 1D 
~ lon lon 1D 


© RadiativeCloudFraction Radiative Cloud Fraction =... Geo2D 
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“New-and-improved” Heuristics 


New-and-Improved Processing Version 


MLS/Aura Level 2 Ozone (O03) Mixing 
Ratio V004 (ML203) at GES DISC 
MILZ0O3 vOO4 - WASA/GSFC/SED/ESD/GCDC/GESDISC 


2004-08-08 ongoing | 4280 Granules 
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MLS/Aura Level 2 Ozone (03) Mixing 
Ratio VO03 (M_203) at GES DISC 
MILZO3 v0O03 - WASA/GSFC/SED/ESD/GCDC/GESDISC 


2004-08-08 to 2015-06-30 | 3935 Granules 
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New processing version Is also 
more likely to be up to date 


MLS/Aura Level 2 Ozone (O03) Mixing 
Ratio V004 (ML203) at GES DISC 
ML20OS vOOd - WASA/GSEC/SEDESD/‘GCDCAGESDISC 


2004-08-08(ongoing 4780 Granules 
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MLS/Aura Level 2 Ozone (03) Mixing 
Ratio V0O0O3 (ML203) at GES DISC 


ML2O3 v0O3 - NASA/GSFC/SED/ESD/GCDC/GESDISC 


2015-06-30)) 3935 Granules 


Newer instrument is usually better than 
previous instruments 


Best Total Ozone Solution (DU) 


304 336 368 


Total Ozone Mapping Spectrometer Ozone Monitoring Instrument 
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Region of Interest Overlap 


Time Range Heuristic 


Datasets covering the user's full time range 
are better than those covering just part of 
It 


2005 2006 2007 2008 2009 2010 


Time range of interest 7 ft 
Toms-carth Probe IT | | | | | | || Meh. 


Ozone Monitoring Yeah! 


Inst. 
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Spatial Heuristic 
Data covering the user's full area are better 
than those covering just part of It. 
This is not aS good as... 


21 


Spatial Heuristic 


© EOSDIS 2 
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Community Usage Heuristic 


The dataset most often used by the 
community Is more likely to be useful 


Data Product 


Aqua AIRS Level 3 Daily Standard Physical Retrieval (AIRS only)” 
Aqua AIRS Level 3 Daily Standard Physical Retrieval (AIRS+AMSU)* 


“Version 6 
** Jan 1, 2016 - June 20, 2016 
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User Intent Heuristics 


User type or intent” The most relevant datasets are... 
Applications users — High spatial resolution, near-real-time 


Students Easier to use data 
e.g., L3 grids in netCDF 


Climate Modeler Datasets on Climate Model Grid 
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